matrix factorization
Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates
Yuanzhi Li, Yingyu Liang, Andrej Risteski
Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints. It enjoys practical success but is poorly understood theoretically. This paper proposes an algorithm that alternates between decoding the weights and updating the features, and shows that assuming a generative model of the data, it provably recovers the groundtruth under fairly mild conditions. In particular, its only essential requirement on features is linear independence. Furthermore, the algorithm uses ReLU to exploit the non-negativity for decoding the weights, and thus can tolerate adversarial noise that can potentially be as large as the signal, and can tolerate unbiased noise much larger than the signal. The analysis relies on a carefully designed coupling between two potential functions, which we believe is of independent interest.
Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent
Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.
Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm
Kejun Huang, Xiao Fu, Nikolaos D. Sidiropoulos
In topic modeling, many algorithms that guarantee identifiability of the topics have been developed under the premise that there exist anchor words - i.e., words that only appear (with positive probability) in one topic. Follow-up work has resorted to three or higher-order statistics of the data corpus to relax the anchor word assumption. Reliable estimates of higher-order statistics are hard to obtain, however, and the identification of topics under those models hinges on uncorrelatedness of the topics, which can be unrealistic. This paper revisits topic modeling based on second-order moments, and proposes an anchor-free topic mining framework. The proposed approach guarantees the identification of the topics under a much milder condition compared to the anchor-word assumption, thereby exhibiting much better robustness in practice. The associated algorithm only involves one eigendecomposition and a few small linear programs. This makes it easy to implement and scale up to very large problem instances. Experiments using the TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free approach exhibits very favorable performance (measured using coherence, similarity count, and clustering accuracy metrics) compared to the prior art.
Sparse Network Inference under Imperfect Detection and its Application to Ecological Networks
Zhang, Aoran, Wei, Tianyao, Guerrero, Maria J., Uribe, Cรฉsar A.
Abstract--Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. We impose nonconvex โ1/2 regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines. Index Terms--augmented Lagrangian, nonconvex nonsmooth optimization, nonnegative matrix factorization, link prediction, ecological network inference, structured sparse recovery I. INTRODUCTION This setting is inherent in sensing and monitoring applications [3], [4], where observations, such as counts, are obtained via an imperfect sampling process. In this paper, we are interested in ecological interaction networks describing how species associate with locations and how environments shape biodiversity patterns [5], [6].
Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data
Seraghiti, Giovanni, Dubrulle, Kรฉvin, Vandaele, Arnaud, Gillis, Nicolas
Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).
Near-Optimal Smoothing of Structured Conditional Probability Matrices
Moein Falahatgar, Mesrob I. Ohannessian, Alon Orlitsky
Utilizing the structure of a probabilistic model can significantly increase its learning speed. Motivated by several recent applications, in particular bigram models in language processing, we consider learning low-rank conditional probability matrices under expected KL-risk. This choice makes smoothing, that is the careful handling of low-probability elements, paramount. We derive an iterative algorithm that extends classical non-negative matrix factorization to naturally incorporate additive smoothing and prove that it converges to the stationary points of a penalized empirical risk. We then derive sample-complexity bounds for the global minimzer of the penalized risk and show that it is within a small factor of the optimal sample complexity.